Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420200120040063
Phonetics and Speech Sciences
2020 Volume.12 No. 4 p.63 ~ p.71
End-to-end speech recognition models using limited training data
Kim June-Woo

Jung Ho-Young
Abstract
Speech recognition is one of the areas actively commercialized using deep learning and machine learning techniques.
However, the majority of speech recognition systems on the market are developed on data with limited diversity of speakers and tend to perform well on typical adult speakers only. This is because most of the speech recognition models are generally learned using a speech database obtained from adult males and females. This tends to cause problems in recognizing the speech of the elderly, children and people with dialects well. To solve these problems, it may be necessary to retain big database or to collect a data for applying a speaker adaptation. However, this paper proposes that a new end-to-end speech recognition method consists of an acoustic augmented recurrent encoder and a transformer decoder with linguistic prediction. The proposed method can bring about the reliable performance of acoustic and language models in limited data conditions. The proposed method was evaluated to recognize Korean elderly and children speech with limited amount of training data and showed the better performance compared of a conventional method.
KEYWORD
speech recognition, end-to-end model, small-data speech recognition
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)